Automatic Text Difficulty Classifier - Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching

نویسندگان

  • Pedro Curto
  • Nuno J. Mamede
  • Jorge Baptista
چکیده

This paper describes a system to assist the selection of adequate reading materials to support European Portuguese teaching, especially as second language, while highlighting the key challenges on the selection of linguistic features for text difficulty (readability) classification. The system uses existing Natural Language Processing (NLP) tools to extract linguistic features from texts, which are then used by an automatic readability classifier. Currently, 52 features are extracted: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features. A classifier was created using these features and a corpus, previously annotated by readability level, using a five-levels language classification official standard for Portuguese as Second Language. In a five-levels (from A1 to C1) scenario, the best-performing learning algorithm (LogitBoost) achieved an accuracy of 75.11% with a root mean square error (RMSE) of 0.269. In a three-levels (A, B and C) scenario, the best-performing learning algorithm (C4.5 grafted) achieved 81.44% accuracy with a RMSE of 0.346.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

Automatic readability classifier for European Portuguese

This paper describes a system that automatically classifies text readability for European Portuguese, while highlighting the key challenges on language features’ selection and text classification. To this goal, the system uses existing Natural Language Processing (NLP) tools to extract linguistic features from texts, which are then used by an automatic readability classifier. Currently, the sys...

متن کامل

Concatenative speech synthesis for European Portuguese

This paper describes our on-going work in the area of text-tospeech synthesis, specifically on concatenative techniques. Our preliminary work consisted in investigating the current trends in concatenative synthesis and the problems that could arise when we apply the existing state-of-the art solutions to the specific case of European Portuguese. Our ultimate goal is to develop a text-to-speech ...

متن کامل

Effect of Gender – Oriented Content Familiarity and Test Type on Reading Comprehension

One important issue that any material or test designer should bear in mind is to be careful about developing teaching and testing reading materials which would not be to the benefit of a group of readers or examinees and to the detriment of some others. One of the factors which may cause difference between the performances of English learners or test takers is the selection of reading tests wit...

متن کامل

The Impact of Contextual Clue Selection on Inference

Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015